Use mozinor for regression

Import the main module



In [1]:

    
from mozinor.baboulinet import Baboulinet









    



/home/jwuthri/anaconda3/lib/python3.6/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

Prepare the pipeline

(str) filepath: Give the csv file
(str) y_col: The column to predict
(bool) regression: Regression or Classification ?
(bool) process: (WARNING) apply some preprocessing on your data (tune this preprocess with params below)
(char) sep: delimiter
(list) col_to_drop: which columns you don't want to use in your prediction
(bool) derivate: for all features combination apply, n1 * n2, n1 / n2 ...
(bool) transform: for all features apply, log(n), sqrt(n), square(n)
(bool) scaled: scale the data ?
(bool) infer_datetime: for all columns check the type and build new columns from them (day, month, year, time) if they are date type
(str) encoding: data encoding
(bool) dummify: apply dummies on your categoric variables

The data files have been generated by sklearn.dataset.make_regression



In [2]:

    
cls = Baboulinet(filepath="toto2.csv", y_col="predict", regression=True)

Now run the pipeline

May take some times



In [3]:

    
res = cls.babouline()









    



Reading the file toto2.csv
Read csv file: toto2.csv
args: {'encoding': 'utf-8-sig', 'sep': ',', 'decimal': ',', 'engine': 'python', 'filepath_or_buffer': 'toto2.csv', 'thousands': '.', 'parse_dates': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'predict'], 'infer_datetime_format': True}
Inital dtypes is a          float64
b          float64
c          float64
d          float64
e          float64
f          float64
g          float64
h          float64
predict    float64
dtype: object
Work on PolynomialFeatures: degree 1
Optimal number of clusters






    



(10000, 9)

    Polynomial Features: generate a new feature matrix
    consisting of all polynomial combinations of the features.
    For 2 features [a, b]:
        the degree 1 polynomial give [a, b]
        the degree 2 polynomial give [1, a, b, a^2, ab, b^2]
    ...


    ELBOW: explain the variance as a function of clusters.







    












    



Optimal number of trees






    



    OOB: this is the average error for each training observations,
    calculted using the trees that doesn't contains this observation
    during the creation of the tree.







    












    



Estimator DecisionTreeRegressor






    



    Decision Tree Regressor: poses a series of carefully crafted questions
    about the attributes of the test record with addition noisy observation.

Fitting 3 folds for each of 10 candidates, totalling 30 fits






    



[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:   16.7s finished
   Best params => {'min_samples_split': 4, 'min_samples_leaf': 4, 'max_depth': 10, 'criterion': 'mse'}
   Best Score => 0.974
Check the decision tree: 2017-08-1813:12:56.922924.png
Estimator ExtraTreesRegressor






    



dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.303036 to fit


    ExtraTreesRegressor: as in random forests, a random subset of candidate
    features is used, but instead of looking for the most discriminative
    thresholds, thresholds are drawn at random for each candidate feature and
    the best of these randomly-generated thresholds is picked as
    the splitting rule.

Fitting 3 folds for each of 10 candidates, totalling 30 fits






    



[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:    8.4s finished
   Best params => {'n_estimators': 75, 'min_samples_split': 7, 'min_samples_leaf': 3, 'max_features': 0.8, 'bootstrap': True}
   Best Score => 0.990
Estimator ElasticNetCV






    



    ElasticNetCV: linear regression with combined
    L1 (lasso penalty) and L2(ridge penalty) priors as regularizer.

Fitting 3 folds for each of 10 candidates, totalling 30 fits






    



[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:    1.9s finished
   Best params => {'tol': 0.1, 'l1_ratio': 0.9}
   Best Score => 1.000
Estimator LassoLarsCV
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    0.1s finished
   Best params => {'normalize': True}
   Best Score => 1.000
Estimator RidgeCV
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.0s finished
   Best params => {}
   Best Score => 1.000
Estimator XGBRegressor






    



    LassoLarsCV: performs L1 regularization, it adds a factor of sum of
    absolute value of coefficients in the optimization objective.
    Usefull with lot of features, made some feature selection.

Fitting 3 folds for each of 10 candidates, totalling 30 fits
Fitting 3 folds for each of 2 candidates, totalling 6 fits

    RidgeCV: performs on L2 regularization, it adds a factor of sum of squares
    of coefficients in the optimization objective.
    Usefull with higly correlated features.

Fitting 3 folds for each of 10 candidates, totalling 30 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits

    Gradient boosting is an approach where new models are created that predict
    the residuals or errors of prior models and then added together to make
    the final prediction. It is called gradient boosting because it uses a
    gradient descent algorithm to minimize the loss when adding new models.

Fitting 3 folds for each of 10 candidates, totalling 30 fits






    



[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:   14.4s finished
   Best params => {'subsample': 0.8, 'n_estimators': 75, 'min_child_weight': 6, 'max_depth': 8, 'learning_rate': 0.1}
   Best Score => 0.996
Work on PolynomialFeatures: degree 2
Optimal number of clusters






    



    Polynomial Features: generate a new feature matrix
    consisting of all polynomial combinations of the features.
    For 2 features [a, b]:
        the degree 1 polynomial give [a, b]
        the degree 2 polynomial give [1, a, b, a^2, ab, b^2]
    ...


    ELBOW: explain the variance as a function of clusters.







    












    



Optimal number of trees






    



    OOB: this is the average error for each training observations,
    calculted using the trees that doesn't contains this observation
    during the creation of the tree.







    












    



Estimator DecisionTreeRegressor






    



    Decision Tree Regressor: poses a series of carefully crafted questions
    about the attributes of the test record with addition noisy observation.

Fitting 3 folds for each of 10 candidates, totalling 30 fits






    



[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:  1.8min finished
   Best params => {'min_samples_split': 8, 'min_samples_leaf': 7, 'max_depth': 10, 'criterion': 'mae'}
   Best Score => 0.967
Check the decision tree: 2017-08-1813:18:33.319946.png
Estimator ExtraTreesRegressor






    



dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.570446 to fit


    ExtraTreesRegressor: as in random forests, a random subset of candidate
    features is used, but instead of looking for the most discriminative
    thresholds, thresholds are drawn at random for each candidate feature and
    the best of these randomly-generated thresholds is picked as
    the splitting rule.

Fitting 3 folds for each of 10 candidates, totalling 30 fits






    



[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:    7.5s finished
   Best params => {'n_estimators': 10, 'min_samples_split': 10, 'min_samples_leaf': 4, 'max_features': 0.7, 'bootstrap': False}
   Best Score => 0.991
Estimator ElasticNetCV






    



    ElasticNetCV: linear regression with combined
    L1 (lasso penalty) and L2(ridge penalty) priors as regularizer.

Fitting 3 folds for each of 10 candidates, totalling 30 fits






    



[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:    3.3s finished
   Best params => {'tol': 0.5, 'l1_ratio': 0.9}
   Best Score => 1.000
Estimator LassoLarsCV
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    0.1s finished
   Best params => {'normalize': True}
   Best Score => 1.000
Estimator RidgeCV






    



    LassoLarsCV: performs L1 regularization, it adds a factor of sum of
    absolute value of coefficients in the optimization objective.
    Usefull with lot of features, made some feature selection.

Fitting 3 folds for each of 10 candidates, totalling 30 fits
Fitting 3 folds for each of 2 candidates, totalling 6 fits

    RidgeCV: performs on L2 regularization, it adds a factor of sum of squares
    of coefficients in the optimization objective.
    Usefull with higly correlated features.

Fitting 3 folds for each of 10 candidates, totalling 30 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits






    



[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.1s finished
   Best params => {}
   Best Score => 1.000
Estimator XGBRegressor






    



    Gradient boosting is an approach where new models are created that predict
    the residuals or errors of prior models and then added together to make
    the final prediction. It is called gradient boosting because it uses a
    gradient descent algorithm to minimize the loss when adding new models.

Fitting 3 folds for each of 10 candidates, totalling 30 fits






    



[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:   25.0s finished
   Best params => {'subsample': 0.1, 'n_estimators': 100, 'min_child_weight': 10, 'max_depth': 5, 'learning_rate': 0.1}
   Best Score => 0.992
                                            Estimator     Score  Degree
0   LassoLarsCV(copy_X=True, cv=None, eps=2.220446...  1.000000       1
1   LassoLarsCV(copy_X=True, cv=None, eps=2.220446...  1.000000       2
2   RidgeCV(alphas=(0.1, 1.0, 10.0), cv=None, fit_...  1.000000       1
3   RidgeCV(alphas=(0.1, 1.0, 10.0), cv=None, fit_...  1.000000       2
4   ElasticNetCV(alphas=None, copy_X=True, cv=None...  0.999896       1
5   ElasticNetCV(alphas=None, copy_X=True, cv=None...  0.999896       2
6   XGBRegressor(base_score=0.5, colsample_bylevel...  0.996338       1
7   XGBRegressor(base_score=0.5, colsample_bylevel...  0.992464       2
8   (ExtraTreeRegressor(criterion='mse', max_depth...  0.990852       2
9   (ExtraTreeRegressor(criterion='mse', max_depth...  0.990334       1
10  DecisionTreeRegressor(criterion='mse', max_dep...  0.974015       1
11  DecisionTreeRegressor(criterion='mae', max_dep...  0.967276       2






    



    Stacking: is a model ensembling technique used to combine information
    from multiple predictive models to generate a new model.

task:   [regression]
metric: [r2_score]

model 0: [LassoLarsCV]
    ----
    MEAN:   [1.00000000]

model 1: [RidgeCV]
    ----
    MEAN:   [1.00000000]

model 2: [ElasticNetCV]
    ----
    MEAN:   [0.99989605]

model 3: [XGBRegressor]
    ----
    MEAN:   [0.99643045]

model 4: [ExtraTreesRegressor]






    



  0%|          | 0/63 [00:00<?, ?it/s]





    



    ----
    MEAN:   [0.99009462]

model 5: [DecisionTreeRegressor]
    ----
    MEAN:   [0.97408961]







    



Stacking 6 models: 100%|██████████| 63/63 [00:41<00:00,  1.33it/s]

The class instance, now contains 2 objects, the model for this data, and the best stacking for this data

To make auto generate the code of the model

Generate the code for the best model



In [4]:

    
cls.bestModelScript()









    



Check script file toto2_solo_model_script.py






    Out[4]:





'toto2_solo_model_script.py'

Generate the code for the best stacking



In [5]:

    
cls.bestStackModelScript()









    



Check script file toto2_stack_model_script.py






    Out[5]:





'toto2_stack_model_script.py'

To check which model is the best

Best model



In [6]:

    
res.best_model









    Out[6]:





Estimator    LassoLarsCV(copy_X=True, cv=None, eps=2.220446...
Score                                                        1
Degree                                                       1
Name: 0, dtype: object



In [7]:

    
show = """
    Model: {},
    Score: {}
"""
print(show.format(res.best_model["Estimator"], res.best_model["Score"]))









    



    Model: LassoLarsCV(copy_X=True, cv=None, eps=2.2204460492503131e-16,
      fit_intercept=True, max_iter=500, max_n_alphas=1000, n_jobs=1,
      normalize=True, positive=False, precompute='auto', verbose=False),
    Score: 1.0

Best stacking



In [8]:

    
res.best_stack_models









    Out[8]:





Fit1stLevelEstimator    [LassoLarsCV(copy_X=True, cv=None, eps=2.22044...
Fit2ndLevelEstimator    DecisionTreeRegressor(criterion='mse', max_dep...
Score                                                            0.999963
Degree                                                                  1
Name: 0, dtype: object



In [9]:

    
show = """
    FirstModel: {},
    SecondModel: {},
    Score: {}
"""
print(show.format(res.best_stack_models["Fit1stLevelEstimator"], res.best_stack_models["Fit2ndLevelEstimator"], res.best_stack_models["Score"]))









    



    FirstModel: [LassoLarsCV(copy_X=True, cv=None, eps=2.2204460492503131e-16,
      fit_intercept=True, max_iter=500, max_n_alphas=1000, n_jobs=1,
      normalize=True, positive=False, precompute='auto', verbose=False), RidgeCV(alphas=(0.1, 1.0, 10.0), cv=None, fit_intercept=True, gcv_mode=None,
    normalize=False, scoring=None, store_cv_values=False), XGBRegressor(base_score=0.5, colsample_bylevel=1, colsample_bytree=1, gamma=0,
       learning_rate=0.1, max_delta_step=0, max_depth=8,
       min_child_weight=6, missing=None, n_estimators=75, nthread=-1,
       objective='reg:linear', reg_alpha=0, reg_lambda=1,
       scale_pos_weight=1, seed=0, silent=True, subsample=0.8)],
    SecondModel: DecisionTreeRegressor(criterion='mse', max_depth=10, max_features=None,
           max_leaf_nodes=None, min_impurity_split=1e-07,
           min_samples_leaf=4, min_samples_split=4,
           min_weight_fraction_leaf=0.0, presort=False, random_state=None,
           splitter='best'),
    Score: 0.9999634403159083